Show the code
import pandas as pd
import numpy as np
from lets_plot import *
# add the additional libraries you need to import for ML here
LetsPlot.setup_html(isolated_frame=True)import pandas as pd
import numpy as np
from lets_plot import *
# add the additional libraries you need to import for ML here
LetsPlot.setup_html(isolated_frame=True)# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html
# Include and execute your code here
neighborhoods = "https://github.com/byuidatascience/data4dwellings/raw/master/data-raw/dwellings_neighborhoods_ml/dwellings_neighborhoods_ml.csv"
dwellings = "https://github.com/byuidatascience/data4dwellings/raw/master/data-raw/dwellings_ml/dwellings_ml.csv"
# import your data here using pandas and the URL
neighborhoods = pd.read_csv(neighborhoods)
dwellings = pd.read_csv(dwellings)This is a simple way of seeing the accuracy and legimilicy of your data extraction by using data classification building models.
Create 2-3 charts that evaluate potential relationships between the home variables and before1980. Explain what you learn from the charts that could help a machine learning algorithm.
It is in simple terms with the numbers lower and some in the higher end.
# Include and execute your code here
ggplot(dwellings, aes(y='before1980', x='livearea'))+ \
geom_histogram()
ggplot(dwellings, aes(y='before1980', x='sprice'))+ \
geom_histogram() Build a classification model labeling houses as being built “before 1980” or “during or after 1980”. Your goal is to reach or exceed 90% accuracy. Explain your final model choice (algorithm, tuning parameters, etc) and describe what other models you tried.
I was only able to get to at most 88% accuracy. I tried prices and stories but living area was the only one to get high enough to 90%.
# Include and execute your code here
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import accuracy_score
features = ['numbdrm', 'numbaths']
dwellings['before1980'] = (dwellings['livearea'] < 1980).astype(int)
X = dwellings[features]
y = dwellings['before1980']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
accuracy = accuracy_score(y_test, y_pred)
print(accuracy)0.8874099934540693
Justify your classification model by discussing the most important features selected by your model. This discussion should include a feature importance chart and a description of the features. I chosed the number of bedrooms and baths in the house because it shows the features of the model from above. With their relationship together we can further back up the accuracy/make it better.
# Include and execute your code here
ggplot(dwellings, aes(x='numbdrm', y='numbaths')) + \
geom_boxplot ()Describe the quality of your classification model using 2-3 different evaluation metrics. You also need to explain how to interpret each of the evaluation metrics you use.
The first calculates the accuracy of the model which I used in task 2. The percision model shows how reliable the predictions are.
from sklearn.metrics import accuracy_score, precision_score, recall_score
accuracy = accuracy_score(y_test, y_pred)
precision = precision_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)
print(accuracy)
print(precision)
print(recall)0.8874099934540693
0.9098080462792533
0.9523809523809523
Repeat the classification model using 3 different algorithms. Display their Feature Importance, and Decision Matrix. Explian the differences between the models and which one you would recommend to the Client.
type your results and analysis here
# Include and execute your code hereJoin the dwellings_neighborhoods_ml.csv data to the dwelling_ml.csv on the parcel column to create a new dataset. Duplicate the code for the stretch question above and update it to use this data. Explain the differences and if this changes the model you recomend to the Client.
type your results and analysis here
# Include and execute your code hereCan you build a model that predicts the year a house was built? Explain the model and the evaluation metrics you would use to determine if the model is good.
type your results and analysis here
# Include and execute your code here